An efficient algorithm for computing the edit distance of a regular language via input-altering transducers

نویسندگان

Lila Kari

Stavros Konstantinidis

Steffen Kopecki

Meng Yang

چکیده

We revisit the problem of computing the edit distance of a regular language given via an NFA. This problem relates to the inherent maximal error-detecting capability of the language in question. We present an efficient algorithm for solving this problem which executes in time O(rnd), where r is the cardinality of the alphabet involved, n is the number of transitions in the given NFA, and d is the computed edit distance. We have implemented the algorithm and present here performance tests. The correctness of the algorithm is based on the result (also presented here) that the particular error-detection property related to our problem can be defined via an input-altering transducer.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Approximate Record Matching

Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...

متن کامل

Prefix Distance Between Regular Languages

The prefix distance between two words x and y is defined as the number of symbol occurrences in the words that do not belong to the longest common prefix of x and y. We show how to model the prefix distance using weighted transducers. We use the weighted transducers to compute the prefix distance between two regular languages by a transducer-based approach originally used by Mohri for an algori...

متن کامل

Edit-Distance Of Weighted Automata: General Definitions And Algorithms

The problem of computing the similarity between two sequences arises in many areas such as computational biology and natural language processing. A common measure of the similarity of two strings is their edit-distance, that is the minimal cost of a series of symbol insertions, deletions, or substitutions transforming one string into the other. In several applications such as speech recognition...

متن کامل

Computing the edit distance of a regular language

The edit distance (or Levenshtein distance) between two words is the smallest number of substitutions, insertions, and deletions of symbols that can be used to transform one of the words into the other. In this paper we consider the problem of computing the edit distance of a regular language (also known as constraint system), that is, the set of words accepted by a given finite automaton. This...

متن کامل

Property and Equivalence Testing on Strings

We investigate property testing and related questions, where instead of the usual Hamming and edit distances between input strings, we consider the more relaxed edit distance with moves. Using a statistical embedding of words which has similarities with the Parikh mapping, we first construct a tolerant tester for the equality of two words, whose complexity is independent of the string size, and...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1406.1041 شماره

صفحات -

تاریخ انتشار 2013

An efficient algorithm for computing the edit distance of a regular language via input-altering transducers

نویسندگان

چکیده

منابع مشابه

Adaptive Approximate Record Matching

Prefix Distance Between Regular Languages

Edit-Distance Of Weighted Automata: General Definitions And Algorithms

Computing the edit distance of a regular language

Property and Equivalence Testing on Strings

عنوان ژورنال:

اشتراک گذاری